-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMP in Pyclaw #527
base: master
Are you sure you want to change the base?
OpenMP in Pyclaw #527
Conversation
…lit algorithm. The unsplit algorithm can also be updated. Usage requires setting the environment variable OMP_NUM_THREADS before program execution. In addition the gfortran flag '-fopenmp' and the f2py flag '-lgomp' are required when compiling step3ds.f90. Added F90 and f2py options for OpenMP (-fopenmp, and gomp => -lgomp) in classic/setup.py so that when compiling 'classic3.so' the openmp lines are compiled.
…e in a dimension size in order to trick f2py into not making nthreads an optional argument. Otherwise it used nthreads=shape(qadd,2) as a size. Since qadd is optional, it get a bad value when qadd is not in the call to step3ds(), which is the default for pyclaw.
src/pyclaw/classic/solver.py
Outdated
self.aux2 = np.empty((num_aux,maxm+2*num_ghost,3),order='F') | ||
self.aux3 = np.empty((num_aux,maxm+2*num_ghost,3),order='F') | ||
mwork = (maxm+2*num_ghost) * (31*num_eqn + num_waves + num_eqn*num_waves) | ||
print "nthreads:",self.nthreads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to remove the print line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. It was left from debugging.
We should modify the travis test case so that it tests this code (set |
The line with |
Yeah, f2py is a bit annoying like that. In any case, if an array enters a parallel region and forks to the threads it should make copies of anything in the stack unless otherwise specified (via a |
removed debugging print statement
Are you wondering if we can query the system to find an optimal number of threads? Or maybe setting the threads via program input rather than using the environment variable? We could set input on the python side with something like this: |
There are a number of ways to set the number of threads that OpenMP can use beyond just the environment variable. In the OpenMP API there is a way to query this but I am not sure how to do this in Python. |
I did a bit of looking into this but could not find a good way to call |
This is great -- thanks, @weslowrie ! My main question is: does this cost us anything when running without OpenMP? I see two potential concerns:
@mandli I guess the first thing to do is just try a PETSc+OpenMP run. @weslowrie have you tried that already by any chance? |
I'm not sure about either of these, probably worth some tests. I think it is possible to NOT include the 'gomp' library while still having gcc, so some systems might not have the proper openmp library. I recently noticed that the
I have not tried the MPI+OpenMP combo yet. I might be able to run a test on a machine with multiple nodes to see if it works. I'll let you know if I am able to successfully test this. |
I'm trying to have a default behavior when the openmp env variable I'm not sure how to handle this yet. A child process cannot change the shell env variable, which is what openmp looks for. I also did not see an obvious way to override openmp's decision when |
|
@weslowrie Some testing is being done now by @aymkhalil -- we're going to see if we can run with MPI+OpenMP on the new supercomputer here, and if it helps over just vanilla MPI. I'm pretty sure we'll want to merge this in, so could you go ahead and add the OMP modifications to step3.f90 too? |
@ketch Yes no problem. I'll update step3.f90 as well. We should find a resolution to the case where Also, in the python setup.py do we need to have a way to skip |
Can you just use the
pre-processing macro? This is defined by OpenMP standard. See http://stackoverflow.com/questions/1300180/ignore-openmp-on-machine-that-doesnt-have-it and http://bisqwit.iki.fi/story/howto/openmp/#Discussion These are C/C++ macros, but I would imagine that some thing similar is available in Fortran. |
It is but I think @weslowrie needs it in Python which we have not been able to figure out beyond running a call through C or Fortran. |
…of CPUs when the OMP_NUM_THREADS environment varibale is unset. This is consistent with what OpenMP returns for omp_get_num_threads() when the env variable is unset. Updated step3.f90 to use OpenMP, based on what is done in clawpack/classic. This was sucessfully tested with the Sedov.py test problem in the euler_3d example folder.
…nto pyclaw_openmp
I have added a modified step3.f90 to include OpenMP as done in classic clawpack. I tested this with the euler_3d, Sedov.py and the regression test within the folder. OpenMP appears to be working properly. |
Good solution! |
I may be wrong about this but if someone were to not set |
@mandli yes you are correct, and it is a problem. Not only are the aux arrays larger, but the work array is larger as well set in How does the PyClaw setup typically deal with custom compiler flags? Should OpenMP be the default, or should we expect the user to set custom flags while compiling/installing PyClaw? |
The flags are usually in the setup.py files in the relevant src directories. Interesting question about the default though, should OpenMP be enabled and use all of the cores on a machine if someone did not set |
@mandli: Don't AMRClaw and GeoClaw use OpenMP now? How are these issues handled there? |
…arning that the 'OMP_NUM_THREADS' environment varibale is unset. This allows an unaware user to run their codes without any changes, and they will see the warning and can adjust accordingly.
I made some changes, and I think it gives a desirable default behavior. The python code checks if the I think this gives a desirable default behavior as an unsuspecting user will only see a warning about the unset env variable, and can set it if they want. If one sets the env variable, then it just does the calculation with set number of threads. This way we don't have any arrays that are larger than necessary. Possible last fix, and I don't know how to do this: |
I just cannot remember if warnings are printed to the console or just to the log file if using the standard PyClaw loggers. Speaking of that you also might want to instead use the logger attached to the solver instead. You can use it with self.logger.warning(warning_stuff) |
…console and to the log file.
@mandli I modified it so it uses the PyClaw logger as suggested. It writes the warning to the console and the log file with this method. I did have to move the code below the |
I would perhaps send the message to a level that is not sent to the console so as to not worry users who are not concerned with such things. This should still send the message to the log file though so astute users can look there. |
@mandli Do you mean setting it to the |
I tried a PETSc + OpenMP run, and superficially it looks like it is working well. I did not time the runs, but it was a noticeable speedup. I ran on 4 nodes (4 MPI jobs), with 4 OpenMP threads per node. The machine I used has 4 CPUs per node. |
Yes to the logger level. Which is not being sent to the logfile? |
If I modify the code to: |
Huh, I thought that one of the levels got sent only to the file. Must have been wrong. I would avoid changing the logging perhaps. |
@mandli Do you think it is worth investigating other solutions to the warning output/logging? Or just leave this as-is? |
@weslowrie I think at this point it is fine as-is |
…lit algorithm. The unsplit algorithm can also be updated. Usage requires setting the environment variable OMP_NUM_THREADS before program execution. In addition the gfortran flag '-fopenmp' and the f2py flag '-lgomp' are required when compiling step3ds.f90. This is done for a custom version of classic3.so, and might also be necessary for the general version.
…so that when compiling 'classic3.so' the openmp lines are compiled.
…of CPUs when the OMP_NUM_THREADS environment varibale is unset. This is consistent with what OpenMP returns for omp_get_num_threads() when the env variable is unset. Updated step3.f90 to use OpenMP, based on what is done in clawpack/classic. This was sucessfully tested with the Sedov.py test problem in the euler_3d example folder.
…arning that the 'OMP_NUM_THREADS' environment varibale is unset. This allows an unaware user to run their codes without any changes, and they will see the warning and can adjust accordingly.
…console and to the log file.
…law_openmp Update again.
This is already in a PR, the question is whether the default method in that example should also be changed. |
26f6238
to
0b595de
Compare
…s using hdf5" This reverts commit 0b595de.
…ch from dimensional split to non-dimensional split solver.
I have updated step3ds.f90 in PyClaw to be able to use openmp. Some small changes were necessary to classic/solver.py to get the openmp environment variable OMP_NUM_THREADS and to properly set the size of the work array. Also the call to step3ds() was changed slightly such that there are not assumed size arrays (f2py does not like these; nthreads is passed in instead)
Additionally classic/setup.py was modified to explicitly add the '-fopenmp' f90 compiler flag as well as adding the 'gomp' library. I'm not sure if this is the best way to handle this, but it does work.
I have not added the openmp directive to step3.f90, but it should be mostly trivial at this point.